Using Decision Tree Induction for Discovering Holes in Data

نویسندگان

  • Bing Liu
  • Ke Wang
  • Lai-Fun Mun
  • Xin-Zhi Qi
چکیده

Existing research in machine learning and data mining has been focused on finding rules or regularities among the data cases. Recently, it was shown that those associations that are missing in data may also be interesting. These missing associations are the holes or empty regions. The existing algorithm for discovering holes has a number of shortcomings. It requires each hole to contain no data point, which is too restrictive for many real-life applications. It also has a very high complexity, and produces a huge number of holes. Additionally, the algorithm only works in a continuous space, and does not allow any discrete/nominal attribute. These drawbacks limit its applications. In this paper, we propose a novel approach to overcome these shortcomings. This approach transforms the holes-discovery problem into a supervised learning task, and then uses the decision tree induction technique for discovering holes in data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DIAGNOSIS OF BREAST LESIONS USING THE LOCAL CHAN-VESE MODEL, HIERARCHICAL FUZZY PARTITIONING AND FUZZY DECISION TREE INDUCTION

Breast cancer is one of the leading causes of death among women. Mammography remains today the best technology to detect breast cancer, early and efficiently, to distinguish between benign and malignant diseases. Several techniques in image processing and analysis have been developed to address this problem. In this paper, we propose a new solution to the problem of computer aided detection and...

متن کامل

An Integrated DEA and Data Mining Approach for Performance Assessment

This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...

متن کامل

Comparing different stopping criteria for fuzzy decision tree induction through IDFID3

Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...

متن کامل

A New Acceptance Sampling Design Using Bayesian Modeling and Backwards Induction

In acceptance sampling plans, the decisions on either accepting or rejecting a specific batch is still a challenging problem. In order to provide a desired level of protection for customers as well as manufacturers, in this paper, a new acceptance sampling design is proposed to accept or reject a batch based on Bayesian modeling to update the distribution function of the percentage of nonconfor...

متن کامل

Designing an intelligent system for predicting chromosomal genetic diseases using data mining

Background and Aim: Today we are witnessing tremendous advances in medical data mining. The data, by analyzing and discovering the relationships between them, can lead to algorithms that help us prevent or treat many diseases. Meanwhile, genetic diseases have attracted a large part of the attention of the medical world because the birth of children with genetic disorders imposes a great financi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998